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Abstract 


Replication of the coronavirus genome requires continuous RNA synthe- 
sis, whereas transcription is a discontinuous process unique among RNA 
viruses. Transcription includes a template switch during the synthesis of 
subgenomic negative-strand RNAs to add a copy of the leader sequence. 
Coronavirus transcription is regulated by multiple factors, including the 
extent of base-pairing between transcription-regulating sequences of posi- 
tive and negative polarity, viral and cell protein—RNA binding, and high- 
order RNA-RNA interactions. Coronavirus RNA synthesis is performed 
by a replication-transcription complex that includes viral and cell proteins 
that recognize cis-acting RNA elements mainly located in the highly struc- 
tured 5’ and 3’ untranslated regions. In addition to many viral nonstructural 
proteins, the presence of cell nuclear proteins and the viral nucleocapsid 
protein increases virus amplification efficacy. Coronavirus RNA synthesis is 
connected with the formation of double-membrane vesicles and convoluted 
membranes. Coronaviruses encode proofreading machinery, unique in the 
RNA virus world, to ensure the maintenance of their large genome size. 
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CORONAVIRUS REPLICATION AND TRANSCRIPTION 


Coronaviruses are enveloped, positive-strand RNA viruses with genomes approximately 30 kb in 
length that belong to the family Coronaviridae in the order Nidovirales (1). Coronaviruses infect a 
wide variety of mammalian and avian species, in most cases causing respiratory and intestinal tract 
disease. Human coronaviruses (HCoVs), such as HCoV-229E, HCoV-OC43, HCoV-NL63, and 
HKU41, have long been recognized as major causes of the common cold. Two recent HCoVs, se- 
vere acute respiratory syndrome coronavirus (SARS-CoV) and Middle East respiratory syndrome 
coronavirus (MERS-CoV), emerged in 2002 and 2012, respectively, causing life-threatening dis- 
ease in humans (2). In addition, novel animal coronaviruses, such as the porcine deltacoronavirus 
(PDCoV) (3) and the porcine epidemic diarrhea virus (PEDV) (4), have recently emerged, causing 
great economic loss in China and the United States. 

The 5’-proximal two-thirds of the coronavirus genome encodes the replicase gene, which 
contains two open reading frames, ORF 1a and ORF 1b (Figure 1a). Translation of ORF 1a yields 
polyprotein 1a (ppla), and —1 ribosomal frameshifting allows translation of ORF1b to yield pplab 
(5, 6). Together, these polyproteins are co- and posttranslationally processed into 16 nonstructural 
proteins (nsps), most of them driving viral genome replication and subgenomic mRNA (sgmRNA) 
synthesis (Figure 1a). The 3’ third of the genome encodes the structural and accessory proteins, 
which vary in number among the different coronaviruses (Figure 1a) (1). 

Coronavirus RNA-dependent RNA synthesis includes two differentiated processes: genome 
replication, yielding multiple copies of genomic RNA (gRNA), and transcription of a collection 
of sgmRNAs that encode the viral structural and accessory proteins (7, 8). 

Like that of other positive-strand RNA viruses, coronavirus genome replication is a process 
of continuous synthesis that utilizes a full-length complementary negative-strand RNA as the 
template for the production of progeny virus genomes. The initiation of negative-strand synthesis 
involves access of the RNA-dependent RNA polymerase (RdRp) to the 3’ terminus of the genome, 
promoted by 3’-end RNA sequences and structures (5). There is evidence that both 5’- and 3’- 
end RNA elements are required for the production of progeny positive-strand RNA from the 
intermediate negative-strand RNA, suggesting that interactions between the 5’ and 3’ ends of the 
genome contribute to replication (9). 

In contrast to replication, coronavirus transcription includes a discontinuous step during the 
production of sgmRNAs (10, 11). This process, unique among known RNA viruses, is a hallmark 
of the order Nidovirales and ultimately generates a nested set of semRNAs that are 5’ and 3’ 
coterminal with the virus genome (Figure 1b). These sgmRNAs all include at their 5’ end a 
common leader sequence, whose length ranges from 65 to 98 nt in different coronaviruses (12). 
This common leader sequence is present only once at the very 5’ end of the genome, which 
implies that sgmRNAs are synthesized by the fusion of noncontiguous sequences, the leader and 
the 5’ end of each mRNA coding sequence, called the body (B). The transcription mechanism 
in coronaviruses is seemingly complicated as compared with the transcription mechanisms in 
other positive-strand RNA viruses, such as internal initiation and premature termination (13). 
In fact, in contrast to coronavirus and arterivirus sgmRNAs, subgenomic transcripts of other 
Nidovirales, such as toroviruses and roniviruses, do not have a common 5’ leader sequence 
(14). This observation raises the question of whether the presence of the leader sequence in 
coronavirus sgmRNAs provides any selective advantage to the virus. The presence of the 5’ leader 
sequence was shown to protect SARS-CoV mRNAs from nsp1-induced endonucleolytic cleavage 
of capped mRNAs, providing a strategy for the efficient accumulation of viral mRNAs and viral 
proteins during infection (15). Moreover, as noted below, the complement of the leader sequence 
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Figure 1 


Coronavirus genome structure and gene expression. (¢) Coronavirus genome structure. The upper scheme represents the TGEV 
genome. Labels indicate gene names; L corresponds to the leader sequence. Also represented are the nsps derived from processing of 
the ppla and pplab polyproteins. PLP1, PLP2, and 3CL protease sites are depicted as inverted triangles with the corresponding color 
code of each protease. Dark gray rectangles represent transmembrane domains, and light gray rectangles indicate other functional 
domains. (b) Coronavirus genome strategy of sgmRNA expression. The upper scheme represents the TGEV genome. Short lines 
represent the nested set of sgmRNAs, each containing a common leader sequence (b/ack) and a specific gene to be translated (dark gray). 
(c) Key elements in coronavirus transcription. A TRS precedes each gene (TRS-B) and includes the core sequence (CS-B) and variable 
5’ and 3’ flanking sequences. The TRS of the leader (TRS-L), containing the core sequence (CS-L), is present at the 5’ end of the 
genome, in an exposed location (orange box in the TRS-L loop). Discontinuous transcription occurs during the synthesis of the 
negative-strand RNA (ight blue), when the copy of the TRS-B hybridizes with the TRS-L. Dotted lines indicate the complementarity 
between positive-strand and negative-strand RNA sequences. Abbreviations: EndoU, endonuclease; ExoN, exonuclease; HEL, helicase; 
MTase, methyltransferase (green, N7-methyltransferase; dark purple, 2'-O-methyltransferase); nsp, nonstructural protein; PLP, 
papain-like protease; RdRp, RNA-dependent RNA polymerase; sgmRNA, subgenomic RNA; TGEV, transmissible gastroenteritis 
virus; TRS, transcription-regulating sequence; UTR, untranslated region. 
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supports initiation of positive-strand RNA synthesis, making the negative-strand subgenomic 
RNAs (sgRNAs) a template for further amplification of positive-strand sgmRNAs. 


RNA Sequences Regulating Transcription 


The transcription process is controlled by transcription-regulating sequences (TRSs) located at the 
3’ end of the leader sequence (TRS-L) and preceding each viral gene (TRS-B) (Figure 1c). TRSs 
include a conserved core sequence (CS) 6-7 nt in length and variable 5’ and 3’ flanking sequences 
(the 5’ TRS and 3’ TRS, respectively) (16). Because the CS is identical for the genome leader (CS- 
L) and all mRNA coding sequences (CS-B), the CS-L could base-pair with the nascent negative 
strand complementary to each CS-B (cCS-B), allowing for leader-body joining (Figure 1c). By 
engineering the base-pairing between the CS-L and the cCS-B in infectious genomic cDNAs of 
coronaviruses (17) and arteriviruses (18, 19), it was formally demonstrated that (a) the discontin- 
uous step of transcription occurs during the synthesis of the negative-strand RNA, and (0) base- 
pairing between the CS-L and the cCS-B is required to drive the template switch of the nascent 
negative-strand RNA to the leader. Additionally, the stability (free energy, AG) of the extended 
duplex between the TRS-L and the complement of the TRS-B (cTRS-B), including 5’ and 3’ TRS 
flanking sequences, was confirmed as a critical regulatory factor for the synthesis of ssmRNAs (20, 
21). 

Coronavirus transcription resembles high-frequency, similarity-assisted copy-choice RNA re- 
combination, requiring sequence identity between donor and acceptor RNAs and hairpin struc- 
tures present in the acceptor RNA (22), in which the TRS-L would act as an acceptor for the 
cTRS-B donor sequence (Figure 1c). Secondary structure analysis of the TRS-L region from 
transmissible gastroenteritis virus (TGEV) (23) and bovine coronavirus (BCoV) (24) showed that 
the CS-L is exposed in the loop of a structured hairpin that is relevant for replication and tran- 
scription (23). These observations provided experimental evidence for the selection of the TRS-L 
during the template switch, excluding other genome TRS-Bs that contain the CS. Only the CS-L, 
located in a sequence context with optimal secondary structure and stability for template switching, 
may act as a landing site for the newly synthesized negative-strand RNA. 

The coronavirus discontinuous transcription process implies a premature termination during 
the synthesis of the negative-strand RNAs and a template switch of the nascent negative-strand 
RNA to the leader (Figure 1c). This switch requires long-distance RNA-RNA interactions, prob- 
ably assisted by RNA-protein complexes that would bring into close proximity the 5’-end TRS-L 
and the TRS-B preceding each gene. These complexes, presumably formed prior to the template 
switch, might contribute to the stoppage of negative-strand RNA synthesis at the TRS-B (7). In 
TGEV, two intragenomic, long-distance RNA-RNA interactions have been described to regu- 
late the transcription of ssmRNA N [coding for the nucleocapsid protein (N protein)], which is 
the most abundant sgmRNA during viral infection despite its low AG value for TRS-L-TRS-B 
duplex formation (25). The first interaction is established between two complementary 9-nt cis- 
acting elements preceding the CS of the N gene, the proximal element (pE) and the distal element 
(dE) (Figure 2), which are located 7 and 449 nt upstream of the CS-N, respectively (25). The 
amount of ssmRNA N produced is directly proportional to the extent of the complementarity 
between pE and dE and inversely proportional to the distance between them (26). This interaction 
is probably necessary to relocate the active domain, another cis-acting RNA motif, consisting of a 
173-nt region at the 5’ flank of dE, immediately preceding the CS-N (Figure 2) (26). The second 
long-distance RNA-RNA interaction is held between a 10-nt sequence within the active domain 
and a complementary RNA motif located at the 5’ end of the viral genome (nucleotides 477 to 
486), more than 25,000 nt apart (27), and represents the longest-distance RNA-RNA interaction 
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Figure 2 


Model for the formation of genome high-order structures regulating N gene transcription. The upper linear 
scheme represents the coronavirus genome. The red line indicates the leader sequence in the 5’ end of the 
genome. The hairpin indicates the TRS-L. The gray line with arrowheads represents the nascent 
negative-sense RNA. The curved blue arrow indicates the template switch to the leader sequence during 
discontinuous transcription. The orange line represents the copy of the leader added to the nascent RNA 
after the template switch. The RNA-RNA interactions between the pE (nucleotides 26894 to 26903) and dE 
(nucleotides 26454 to 26463) and between the B-M in the active domain (nucleotides 26412 to 26421) and 
the cB-M in the 5’ end of the genome (nucleotides 477 to 486) are represented by solid lines. Dotted lines 
indicate the complementarity between positive-strand and negative-strand RNA sequences. Abbreviations: 
AD, active domain secondary structure prediction; B-M, B motif; cB-M, complementary copy of the B-M; 
cCS-N, complementary copy of the CS-N; CS-L, conserved core sequence of the leader; CS-N, conserved 
core sequence of the N gene; dE, distal element; pE, proximal element; TRS-L, transcription-regulating 
sequence of the leader. For an animated version of the model, see Video 1 or download a PowerPoint 
slideshow. 


reported so far in the RNA virus world (Figure 2). This interaction could bring into physical 
proximity the leader sequence, at the genome 5’ end, and the TRS-N, which would promote the 
template switch during synthesis of the negative-strand sgRNAs (Figure 2). This long-distance 
RNA-RNA interaction provided for the first time experimental support of the physical proxim- 
ity between the TRS-L and a TRS-B during discontinuous transcription in order to promote 
efficient RdRp transfer. The secondary structure of the active domain and the high-order struc- 
ture formed by the RNA-RNA interactions could also promote the slowdown and stoppage of 
the transcription complex at the CS-N, as described for tombusvirus transcription (28). The se- 
quences and secondary structures of the RNA motifs involved in these long-distance interactions 
are conserved among members of the species A/phacoronavirus I, suggesting a functional similarity 


(27), 
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Three-step model of coronavirus transcription. (@) Complex formation. Proteins binding transcription-regulating sequences are 
represented by ellipsoids, the leader sequence is indicated with a red bar, and core sequences are indicated with orange boxes. 

(@) Base-pairing scanning. Negative-strand RNA is shown in light blue; the transcription complex is represented by a hexagon. 
Vertical lines indicate complementarity between the genomic RNA and the nascent negative strand. (@) Template switch. Due to the 
complementarity between the newly synthesized negative-strand RNA and the transcription-regulating sequence of the leader, 
template switch to the leader is made by the transcription complex to complete the copy of the leader sequence. 
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Coronavirus Transcription Model 


Experimental data on transcription in coronaviruses (7, 17, 21,25, 27) and the related arteriviruses 
(14) can be integrated into a transcription model that includes three steps (Figure 3): (a) First, 
gRNA forms transcription initiation precomplexes, bringing into physical proximity the distal 
TRS-L and TRS-B. RNA-RNA, RNA-protein, and protein-protein interactions might main- 
tain these precomplexes in a dynamic equilibrium. (4) These precomplexes act as slowdown and 
detachment signals for the transcription complex during the synthesis of negative-strand RNA. 
(c) Once the TRS-B has been copied, if the AG of duplex formation between the cTRS-B (in the 
nascent negative-strand RNA) and the TRS-L exceeds a minimum threshold, a template switch to 
the leader takes place, adding a copy of the TRS-L to complete the negative-strand sg RNA. These 
negative-strand sgRNAs subsequently serve as templates for the synthesis of multiple copies of 
sgmRNAs. 


REGULATION OF CORONAVIRUS PROTEIN 
STOICHIOMETRIC RATIOS 


Viruses have developed diverse strategies to ensure the optimal expression ratio of each viral 
protein. In the case of coronaviruses, replicase proteins are expressed from a full-length gRNA 
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by translation of two polyproteins that are proteolytically cleaved. In contrast, structural and 
accessory proteins are expressed from a nested set of sgmRNAs. Therefore, the abundance of 
each sgmRNA must be tightly regulated during the discontinuous transcription process to ensure 
appropriate viral protein ratios. 

Multiple factors regulate the transcription process by modulating the template switch frequency 
during discontinuous transcription (9, 29). The most important one is the complementarity be- 
tween the TRS-L and the cTRS-B (17, 21). In a study of several coronaviruses, most sgmRNAs 
synthesized could be predicted in silico by local base-pairing calculations (17). Additional factors 
may regulate sgmRNA levels, such as TRS secondary structure, proximity to the 3’ end, and 
RNA-RNA or protein-RNA interactions (7, 9). In this sense, coronavirus N protein is required 
for efficient sgmRNA transcription (30, 31). Coronavirus N protein has RNA chaperone activity 
that drives template switching in vitro and may also facilitate template switching during coron- 
avirus transcription (31). Although nonessential for RNA synthesis, coronavirus nsp1 is associated 
with viral components of the replication-transcription complex (RTC) (32). Therefore, it may 
modulate coronavirus RNA synthesis similarly to arterivirus nsp1 protein, which modulates the 
relative abundance of sgmRNAs and gRNA (33). 

As components of the coronavirus RTC, cell proteins can also modulate sgmRNA ratios. 
Infectious bronchitis virus (IBV) N protein was recently shown to recruit cellular helicase DDX1 
to viral RTCs, facilitating TRS read-through and synthesis of long sgmRNAs (34). Interestingly, 
DDX1 recruitment requires N protein phosphorylation by cellular GSK3 kinase (34). Thus, the 
cell factor DDX1, attracted by phosphorylated N protein, provides a unique strategy for the 
transition from discontinuous to continuous transcription in coronaviruses to ensure balanced 
sgmRNA and full-length gRNA synthesis. 

Coronavirus protein ratios are also posttranscriptionally regulated. Most sgmRNAs are struc- 
turally polycistronic but functionally monocistronic, with only the 5’-most ORF being translated 
into a viral protein. The clearest example of coronavirus translational regulation is the expres- 
sion of the polyprotein pp lab, which is generated by a programmed —1 ribosomal frame-shifting 
mechanism (35). This process leads to minor levels of most of the RNA-modifying enzymes, en- 
coded by ORF 1b, in comparison with those of other replicase enzymes, such as proteases, encoded 
by ORF 1a. Alteration of coronavirus frame-shifting efficiency modified the ratio of replicase pro- 
teins, affecting viral RNA synthesis and virus production (36). In this sense, regulation of the ratio 
between the two viral polymerases nsp8 and nsp12, encoded by ORF 1a and ORF1b, respectively, 
may be involved in controlling the levels of the different sgmRNAs during viral RNA synthesis 
(37). 


ROLE OF DOUBLE-MEMBRANE VESICLES 


Like that of other positive-strand RNA viruses, coronavirus RNA synthesis is associated with 
extensively rearranged intracellular membranes (38). High-resolution three-dimensional images 
obtained by electron tomography in SARS-CoV-infected cells showed a unique reticulovesicu- 
lar network of modified endoplasmic reticulum that integrated convoluted membranes (CMs), 
interconnected double-membrane vesicles (DMVs), and vesicle packets apparently arising from 
DMV merger. Viral replicase subunits (nsp3, nsp5, and nsp8) localized to CMs, whereas dsRNA, 
presumably the replicative intermediate, mainly localized to the DMV interior, supporting the 
concept that the membrane network would contribute to protecting replicating RNA from antivi- 
ral defense mechanisms (38). In mouse hepatitis virus (MHV)-infected cells, newly synthesized 
RNA was detected in close proximity to DMVs and CMs (39), and viral RNA levels correlated 
with the number of DMVs (40-42). However, other data do not necessarily support the active 
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contribution of DMVs to viral RNA synthesis. Nascent MHV RNAs colocalize with dsRNA only 
at early times postinfection; at later times, the dsRNA distributed throughout the cell is apparently 
transcriptionally inactive (43). Furthermore, RdRp or nascent viral RNA has not been detected 
inside DMVs, and ultrastructural analysis could not confirm any connection between the DMV 
interior and the cytoplasm (38), raising questions about the import and export of ribonucleotide 
precursors and produced RNAs exported from RNA synthesis areas (44). The coexpression of the 
SARS-CoV transmembrane nonstructural proteins nsp3, nsp4, and nsp6 resulted in the formation 
of CMs and DMVs (45), suggesting a function in the biogenesis of the membranous replicative 
structures, and also in the anchoring of the RTC (46-48). 

In addition to DMVs, the gammacoronavirus IBV also induces different membrane structures 
such as spherules tethered to zippered endoplasmic reticulum (49). Unlike any previously identified 
coronavirus-induced structure, IBV spherules contain a pore connecting their interior to the cell 
cytoplasm (50). 

The function and dynamics of DMVs and CMs and the precise localization of the sites of 
active viral RNA synthesis are still unresolved questions, and further studies are required. A 
possible model proposes that DMVs may be the initial sites of active RNA synthesis early in 
infection, whereas at later times, after membrane connections are lost, RNA synthesis shifts to the 
CMs, and DMVs become end-stage products that sequester nonfunctional dsRNAs to prevent 
the stimulation of the innate immune response (51, 52). 


STRESS GRANULES AND PROCESSING BODIES IN 
REPLICATION-TRANSCRIPTION COMPLEX ACTIVITY 


Stress granules and processing (P) bodies are cytoplasmic RNA granules that contain transla- 
tionally silenced messenger ribonucleoproteins, contributing to translation regulation in cells. 
Whereas P bodies are constitutively expressed and include components involved in mRNA decay, 
stress granules are thought to be sites of mRNA storage and triage formed in response to stress 
conditions. Stress granules represent an intermediate stage in the dynamic equilibrium between 
active translation on free polysomes and mRNA decay in P bodies (53, 54). 

During infection, RNA viruses dynamically interact with stress granules and P bodies (55), lead- 
ing to varying stress granule phenotypes. Many viruses have evolved mechanisms to antagonize 
the formation of stress granules, suggesting that stress granules are involved in restricting virus 
replication through RNA silencing (56, 57). In contrast, other RNA viruses, such as respiratory 
syncytial virus, induce stress granule formation and take advantage of stress granule responses as 
part of the infectious cycle (58). For coronaviruses, MHV replication was found to be enhanced in 
cells deficient in stress granule formation, implying that stress granules contribute to viral inhibi- 
tion (59). TGEV induced stress granules that persisted from 7 to 16 hpi, which was correlated with 
a decrease in viral replication and transcription (60). These granules contained the stress granule 
markers T cell intracellular antigen 1 (TIA-1), TIA-1-related protein (TIAR), and polypyrimidine 
tract—binding protein (PTB) in association with viral gRNA and sgmRNAs. TGEV-induced stress 
granules might contribute to the spatiotemporal regulation of viral RNA synthesis. Several stress 
granule proteins (including caprin and G3BP) have been associated with IBV N protein, pointing 
to the relevance of these RNA-protein complexes in the regulation of coronavirus gene expression 
(61). 

A new hypothesis postulates that stress granules are involved in an integrated stress—innate 
immunity activation response (57, 62). In this pathway, viral RNA and proteins, along with host 
pathogen-sensing factors, such as the dsRNA-binding protein kinase R (PKR) and the RNA 
helicases retinoic acid-induced gene 1 (RIG-I) and melanoma differentiation—associated gene 5 
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Table 1 Nidovirus proteins that localize to the nucleus 


Family Virus Protein Reference(s) 

Coronaviridae IBV N 69 
TGEV N 69 
MHV N 69 
IBV 3b 74 
SARS-CoV 3b 76 
MERS-CoV 4b 75,78 
SARS-CoV 6 79, 159 
SARS-CoV 9b 81, 160 
TGEV nsp1 82 

Arteriviridae EAV N 85 
PRRSV N 88, 92, 93, 161 
EAV nsp1 85 
PRRSV nsp1 89, 91 


Virus structural proteins: nucleocapsid (N), 3b, 4b, 6, and 9b. Virus nonstructural protein: nonstructural protein 1 (nsp1). 
Virus name abbreviations: EAV, equine arteritis virus; IBV, infectious bronchitis virus, MERS-CoV, Middle East 
respiratory syndrome coronavirus; MHV, mouse hepatitis virus; PRRSV, porcine reproductive and respiratory syndrome 
virus; SARS-CoV, severe acute respiratory syndrome coronavirus; TGEV, transmissible gastroenteritis virus. 


(MDAS), can be sequestered in stress granules (63). Additional insight into the relevance of stress 
granules and P bodies for the regulation of coronavirus RNA synthesis is still required. 


RELEVANCE OF THE CELL NUCLEUS IN CORONAVIRUS 
RNA SYNTHESIS 


All positive-strand RNA viruses that infect animals replicate in the cytoplasm of the infected host 
cell. However, there is ample evidence that implicates the nucleus and nuclear proteins in the 
replication and pathogenesis of positive-strand RNA viruses, including coronaviruses (64). The 
replication of these RNA viruses in enucleated cells is variable, ranging from 10% to 100% of 
that in nucleated controls (65, 66). The relocation of nuclear proteins to the cytoplasm and of 
viral proteins to the nucleus during virus replication (7, 64, 67) (Table 1) highlights the relevance 
of this organelle during the coronavirus infectious cycle and raises important questions: What is 
the role of nuclear factors in the replication of these viruses, and do viral proteins traveling to the 
nucleus participate in RTC activity? 

The coronavirus protein most frequently associated with the host cell nucleus is the N protein, 
and its transport to the nucleus is regulated by phosphorylation (68). N protein nuclear localiza- 
tion is associated with induction of cell cycle arrest and inhibition of cytokinesis (68-72) and is 
involved in recruitment to the cytoplasm of cell nuclear proteins, such as heterogeneous nuclear 
ribonucleoprotein Al (hnRNP A1) and the helicase DDX1 (34, 73). As noted above, N protein— 
recruited DDX1 functions in the RTC in facilitating TRS read-through and synthesis of long 
sgmRNAs (34). The 3b proteins of IBV (74) and SARS-CoV (75, 76), though different in nature, 
have also been located in part in the nuclei of transfected or infected cells. Following nuclear local- 
ization, SARS-CoV 3b protein traffics to the outer membrane of mitochondria, where it inhibits 
the induction of type 1 interferon (IFN) elicited by RIG-I and the mitochondrial antiviral signal- 
ing protein (77). Similarly, the 4b proteins of MERS-CoV, bat coronavirus (BtCoV)-HKU4, and 
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BtCoV-HKUS5 also localize to the nucleus and inhibit type 1 IFN induction and, less efficiently, 
NF-«B signaling pathways (78). 

SARS-CoV proteins 6 and 9b affect nucleocytoplasmic transport. Protein 6 impedes nuclear 
import of factors such as STATI (79) and antagonizes IFN signaling pathways (80). Protein 9b 
shuttles from the nucleus by its interaction with cellular exportin 1 (Crm1), which is essential for 
proper protein 9b degradation, and blocking nuclear export of protein 9b induces cell apoptosis 
(81). 

In TGEV-infected cells, nsp1 is distributed in both the nucleus and the cytoplasm (82), which 
is not surprising as it can freely diffuse into the nucleus because of its small molecular weight 
(~9 kDa) (83). In contrast to TGEV nsp1, both MHV nsp1 and SARS-CoV nsp1 are localized 
exclusively in the cytoplasm of virus-infected cells (83). Due to its binding to the 40S ribosomal 
subunit, nsp1 inhibits cellular mRNA translation in some cases (HCoV-229E and HCoV-NL63) 
but not in others (TGEV) (82, 83). In addition, nsp1 inhibits IFN induction and signaling (83). 

Arterivirus nsp1 and N proteins also localize in the cytoplasm and the nucleus of infected cells 
(84, 85). Porcine reproductive and respiratory syndrome virus (PRRSV) N protein accumulates in 
the nucleoli of infected cells, where it interacts with the host cell proteins fibrillarin, nucleolin, and 
poly(A)-binding protein (PABP), the latter of which is transported to the nucleus during infection 
(86, 87). PRRSV N protein also activates the NF-«B pathway and enhances its nuclear localization. 
The presence of N protein in the nucleus seems important for PRRSV, as removal of its nuclear 
localization signal significantly attenuates the virus (88). The nsp1 protein interferes with IRF3- 
mediated IFN activity in the nucleus and with the NF-«xB-mediated pathway in the cytoplasm 
(89-91). The nsp1 subunit of nsp1 suppresses the JAK-STAT pathway and also interacts with 
protein inhibitor of activated STAT1 (PIAS1) (92, 93). Because PIAS] is a nuclear protein with 
multiple functions, its interaction with nsp1 may lead to the modulation of several host cell 
pathways. 

Coronavirus and arterivirus proteins, like those from other cytoplasmic RNA viruses (65), 
interact with host cell proteins, modifying their nuclear-cytoplasmic localization and thereby 
affecting viral replication levels and modulating innate immune responses. ‘Thus, nuclear proteins 
such as hnRNP Al and PTB accumulate in the cytoplasm of cells infected by MHV and TGEV, 
respectively (60, 94). These proteins bind to TRSs and to the 5’ end of the viral genome (95, 96); 
PTB additionally reduces coronavirus RNA accumulation (60). 

Other nuclear proteins, including the p100 transcriptional coactivator, PABP, and certain 
members of the hnRNPs such as hnRNP Q, showed preferential binding to the 3’ end of the 
coronavirus genome and a positive effect on coronavirus RNA synthesis (95, 97). The contribution 
of these proteins to host cell interactions in TGEV infection is supported by the formation of a 
complex including glyceraldehyde 3-phosphate dehydrogenase (GAPDH), glutamyl-prolyl-tRNA 
synthetase (EPRS), hnRNP Q, and the ribosomal protein L13a, which regulates the expression 
of inflammatory genes (98, 99). Similarly, in infections by other RNA viruses, several nuclear 
proteins (La, Sam68, PTB, proteasome activator PA28y, and nucleolin) also relocalized to the 
cytoplasm and were involved in virus replication (65). 


RNA GENOME 5’ AND 3’ CIS-ACTING ELEMENTS INVOLVED IN 
CORONAVIRUS RNA SYNTHESIS 


Similar to that of other positive-strand RNA viruses, coronavirus RNA synthesis requires the 
specific recognition of cis-acting RNA elements, which are mainly located in the highly structured 
5’ and 3’ untranslated regions (UTRs), although they may also extend into the adjacent coding 
sequences (9, 100, 101). Such cis-acting RNA elements in the 5’ end of the coronavirus genome were 
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Coronavirus cis-acting RNA elements. The higher-order RNA structures indicated in the diagram are mainly based on studies done in 
betacoronaviruses. The core sequence within the leader transcription-regulating sequence is shown as an orange box on the top of SL3. 
Abbreviations: BSL, bulged stem loop; HVR, hypervariable region; L1, loop 1 of the pseudoknot; N, nucleocapsid; Oct, conserved 
octanucleotide; PK, pseudoknot; $1, stem 1 of the pseudoknot; SL, stem loop; UTR, untranslated region. 


first studied in BCoV using defective interfering RNAs. More recently, these studies have been 
extended to other betacoronaviruses and, to a lesser extent, to alpha- and gammacoronaviruses. 
‘These RNA elements mainly consist of stem-loop (SL) structures that present a varying degree of 
conservation among different coronaviruses (9, 101-103). In order to consolidate the information 
from various publications, we adopt a uniform nomenclature for the 5’ cis-acting RNA elements 
(Figure 4) based on that used by the Leibowitz, Giedroc, and Olsthoorn laboratories (102, 103). 
SLI and SL2 are conserved in all coronaviruses (101, 102). SL1 adopts a bipartite structure 
(104), and SL2 presents a highly conserved loop sequence that adopts a YNMG- or CUYG-type 
tetraloop conformation (105, 106). Both SLs are required for genome replication, and they may 
have a specific role in sgmRNA synthesis (104-106). SL3, which contains the leader CS, is involved 
in discontinuous transcription as a receptor for nascent negative-strand RNA. Downstream of the 
leader CS is SL4, a long hairpin that is structurally conserved in all coronavirus genera (101, 102). 
SL4 is essential for replication of BCoV defective interfering RNA (107), and it was proposed 
to function as a spacer element that controls the orientation of upstream SLs driving sgmRNA 
synthesis (108). Finally, a higher-order RNA structure (SL5) that extends into ORFla seems 
to be conserved within specific coronavirus genera (102). In alphacoronaviruses, SL5 may be 
further subdivided into three hairpins (SL5a, SL5b, and SL5c), which are partially conserved in 
betacoronaviruses (101). In gammacoronaviruses, a possible SL5 is predicted to adopt a rod-like 
structure (102). 

Initial studies using defective interfering RNAs from alpha-, beta-, and gammacoronaviruses 
delimitated the 3’ cis-acting RNA elements required for coronavirus RNA synthesis to the 3’ 
UTR plus the poly(A) tail (100). Further investigations allowed the identification and functional 
characterization of these 3’ cis-acting RNA elements (reviewed in 9, 101) (Figure 4). Immediately 
downstream of the N gene stop codon, there are two overlapping, essential RNA structures 
consisting of a bulged stem loop (BSL) and a hairpin-type RNA pseudoknot (PK), which are 
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structurally and functionally conserved in all betacoronaviruses (9, 100). An intriguing property 
of the BSL and the PK is that, because they overlap by 5 nt, they cannot simultaneously fold up 
completely, which has led to the speculation that each element may adopt alternate configurations, 
acting as a molecular switch that operates at some stage of RNA synthesis (109). In addition, 
evidence for a direct interaction of the PK loop 1 (L1) with the 3’ end of the genome and with 
nsp8 and nsp9 has been reported (110). Based on these studies, a model for the initiation of 
coronavirus negative-strand RNA synthesis was proposed (Figure 5). In this model, the binding 


w 


a SL BSL 


3'AAA, 


3'AAA, 


Figure 5 


Coronavirus replication-transcription complex. (¢) After binding of the nsp8, nsp7, and nsp9 complex to the 
genomic RNA 3’ end, the nsp8 primase activity initiates RNA synthesis de novo. This leads to a 
conformational change in the 3’-end RNA structure, allowing transition from a BSL to a PK folding. (b) PK 
formation allows binding of the RNA-dependent RNA polymerase (nsp12) complex, including helicase and 
nsp14-nsp10. This core replication-transcription complex includes polymerase activity (nsp12 and nsp8), 
processivity factors (nsp7 and nsp8), and proofreading activity (nsp14 and nsp10). Abbreviations: BSL, 
bulged stem loop; HVR, hypervariable region; nsp, nonstructural protein; PK, pseudoknot. 
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of a protein complex (including nsp8 and nsp9) to the stem formed by base-pairing of the PK 
L1 and the 3’ end of the genome may cause a conformational shift that leaves free the 3’ end of 
the genome and disrupts the lower stem of the BSL, leading to the formation of the PK. In this 
new conformation, the 3’ end of the genome is recognized by the RdRp and associated factors, 
promoting the initiation of negative-strand RNA synthesis (110). Further studies are required 
to confirm this model and to analyze whether or not alpha- and gammacoronaviruses, which 
lack either the BSL or the PK, respectively (101, 111, 112), employ a similar molecular switch 
mechanism. 

Downstream of the PK there is a hypervariable region (HVR) that is highly divergent in se- 
quence and structure among coronaviruses but contains a universally conserved octanucleotide 
sequence in a single-stranded region (Figure 4) (101, 112-114). In MHV, the HVR was predicted 
to fold in a complex multiple stem-loop structure, which is nonessential for RNA synthesis in cell 
culture but affects pathogenicity in vivo (110, 115). Although the strict conservation of the octanu- 
cleotide sequence suggests an important functional role, the activity of this sequence remains to be 
determined. Finally, MHV and BCoV defective interfering RNA studies have provided evidence 
that the poly(A) tail is another cis-replication signal that requires a minimum of 5 to 10 adenylate 
residues to be functional, probably via interaction with PABP (116). 


CELLULAR AND VIRAL PROTEINS OF THE CORONAVIRUS 
REPLICATION-TRANSCRIPTION COMPLEX 


Most of the nsps encoded in the replicase gene, together with the N protein and an unknown 
number of cellular proteins, assemble into a membrane-associated viral RTC that mediates both 
genome replication and the synthesis of the nested set of sgmRNAs (5, 7, 9, 30, 31, 117-119). From 
these, the key enzymes involved in coronavirus RNA synthesis are the RdRp (nsp12), the helicase 
(nsp13), and the nsp7-nsp8 complex, which is a processivity factor for the RdRp (Figure 5). 

The RdRp domain, located in the C-terminal region of nsp12, contains all conserved motifs of 
canonical RdRps, including the palm, fingers, and thumb domains (120, 121). In addition to the 
RdRp domain, nsp12 also contains an N-terminal domain that is essential for RdRp activity (122) 
and probably interacts with nsp5, nsp8, and nsp9 (123). In vitro, full-length nsp12 drives RNA 
synthesis in a primer-dependent manner on both homo- and heteropolymeric RNA templates 
(124, 125). However, the in vitro nsp12 RdRp activity is weak and nonprocessive, in contrast with 
the efficient replication of the RNA genome in vivo. 

Coronavirus nsp8 bears a second, noncanonical RdRp activity that synthesizes short oligonu- 
cleotides (<6 nt), acting as an RNA primase that produces the primers required for nsp12-mediated 
RNA synthesis (37, 126). Structural studies have shown that nsp8 interacts with nsp7, forming a 
hexadecameric protein complex (eight molecules of each nsp) that contains a channel capable of 
encircling RNA due to its internal dimensions and electrostatic properties (127, 128). This com- 
plex, which is active in both de novo initiation and primer extension (126), confers processivity to 
the RdRp in an in vitro assay using purified proteins (129). The nsp7-nsp8-nsp12 complex formed 
in vitro is able to catalyze de novo synthesis of relatively long RNAs (up to 340 nt) in a processive 
manner. Also interacting with nsp8 is nsp9, a small protein that binds ssRNA without sequence 
specificity (130-132). Dimerization of nsp9 is critical for virus replication (133), and its structural 
and functional features strongly suggest that it could be a component of the RTC catalytic core, 
stabilizing viral RNAs during RNA synthesis and processing. 

Coronavirus nsp13 contains a superfamily 1 helicase domain linked to an N-terminal zinc- 
binding domain (134) that is essential for helicase activity in vitro (135). The protein is able to 
unwind dsRNA and dsDNA in a 5’-to-3’ direction with the energy obtained by the hydrolysis 
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of all NI'Ps and dNTPs (136-140). It was proposed that the resulting ssRNAs probably serve as 
templates for RNA synthesis by the RdRp. Besides NTPase and dNTPase activities, coronavirus 
helicases also possess RNA 5’-triphosphatase (RTPase) activity, which may be involved in viral 
mRNA capping (136, 138). Obviously, the nsp13 5’-to-3’ helicase activity does not fit the expected 
3’-to-5’ polarity required to separate secondary structures in the RNA template during negative- 
strand synthesis (118). Thus, cellular helicases may bind the RTC to assist coronavirus proteins 
in 3/-to-5’ unwinding. SARS-CoV nsp13 has been shown to interact specifically with the cellular 
RNA helicase DDXS5, which is involved in coronavirus RNA synthesis (141). In addition, the 
cellular helicase DDX1 is recruited to RTCs, and the effects of DDX1 expression knockdown 
indicate that it might be an essential cofactor for coronavirus RNA replication and transcription 
(34, 142). Interestingly, the helicase activity of nsp13 is enhanced 2-fold by nsp12 through direct 
protein-protein interaction (143), suggesting that interaction of these proteins in a functional 
RTC improves the efficiency of viral RNA synthesis. 


Coronavirus Proofreading System 


The replication and maintenance of the coronavirus genome, the largest known viral RNA, is 
a hallmark of nidoviruses. These viruses encode a unique set of RNA-modifying activities that 
are not present in other viral RNA genomes. One of them is the exonuclease activity of nsp14 
(ExoN), related to the DEDD superfamily of exonucleases (144). In addition to the N-terminal 
ExoN domain, nsp14 also contains a C-terminal N7-methyltransferase (N7-MTase) domain, 
which is involved in viral mRNA capping (145). 

Coronavirus nsp14 ExoN activity was proposed to be part of an RNA proofreading machinery 
during coronavirus replication (146), and accumulating data support this role (147). Phylogenet- 
ically, only long-size nidovirus genomes encode an ExoN activity, and acquisition of this activity 
seems crucial to allow nidovirus genome expansion. The discovery of insect nidoviruses with a 
20-kb RNA genome, encoding ExoN function, strongly reinforced this idea (148). In addition, 
point mutations in the catalytic ExoN residues led to coronaviruses with altered replication fi- 
delity and a mutator phenotype, as they have 15- to 20-fold higher mutation accumulation rates 
(149, 150). As a proofreading component, ExoN activity should contribute to the removal of mis- 
incorporated nucleotides. In fact, nsp14 ExoN activity efficiently removes a mismatched 3’-end 
nucleotide, mimicking an RdRp misincorporation product (151). At the same time, coronaviruses 
are relatively resistant to mutagens such as ribavirin and 5-fluorouracil, whereas coronaviruses 
with reduced ExoN activity are highly susceptible to these agents (152). These findings were the 
first experimental evidence supporting nsp14 ExoN activity in a coronavirus proofreading system. 

The nsp14 protein is part of the RTC core complex, formed by nsp12 (RdRp) and the nsp7- 
nsp8 processivity factor, providing proofreading and capping activities (129). Interestingly, nsp10 
is able to enhance ExoN activity up to 35-fold in vitro (151), binding nsp14 either alone or in the 
nsp7-nsp8-nsp12-nsp14 complex (129). The involvement of nsp10 in coronavirus RNA synthesis 
was first reported from the analysis of MHV mutants (153). More recently, it has been shown that 
nsp10 acts as a cofactor of both nsp14 ExoN and nsp16 methyltransferase (MT ase) activities (154). 
Moreover, as nsp14 and nsp16 bind to overlapping nsp10 sites, nsp10 might act as a molecular 
switch, mediating interactions between RNA and proteins from both proofreading and mRNA 
capping machineries. 


Coronavirus RNA Capping Pathway 


Capping of viral RNAs by conventional or unconventional pathways (reviewed in 155) leads to 
5’-end cap structures that allow efficient viral protein synthesis and, in many cases, escape from the 
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innate immune system. Coronaviruses follow the canonical capping pathway, which consists of 
four sequential enzymatic reactions: (@) RTPase, encoded by the nsp13 helicase, hydrolyzing the 
y-phosphate of the mRNA; (’) an as-yet-unidentified guanylyltransferase (GT ase) adding GMP 
to the 5’-diphosphate RNA; (c) nsp14 N7-MTase methylating the guanosine, leading to a cap-0 
structure that is essential for efficient translation initiation; and (¢) nsp16 2’-O-methyltransferase 
(2'-O-MTase) carrying out further methylations, leading to cap-1 and cap-2 structures, which 
are required to efficiently escape the nonself RNA recognition system of the host cell (156). 
Interestingly, whereas nsp10 binding has no effect on nsp14 N7-MTase activity, nsp10 is required 
for nsp16 2-O-MTase activity. These data, in conjunction with those in the preceding section, 
highlight the importance of nsp10 as modulator of two different activities in the coronavirus 
proofreading and capping machinery. 


Encapsidation of the Coronavirus Replication-Transcription Complex 


It is currently accepted that, unlike that in negative-strand RNA viruses, the RTC in positive- 
strand RNA viruses generally is not incorporated into viral particles. However, recent studies 
based on proteomic, biochemical, and immunoelectron microscopy assays reported the presence 
of RdRp, nsp2, nsp3, and nsp8 in TGEV particles (157) and nsp2, nsp3, and nsp5 in SARS- 
CoV particles (158). These data suggest that the RTC might be encapsidated in coronaviruses. 
It is speculated that the encapsidated RTC could act as a starting replication machinery, with a 
round of genome amplification before translation leading to improved efficiency of virus infec- 
tion. Further studies are required to investigate whether other viral and cellular components 
of the RTC are also encapsidated and what biological role they play in the coronavirus life 
cycle. 


SUMMARY POINTS 


1. Coronaviruses express their 3’-proximal ORFs through a collection of overlapping, 
nested segmRNAs generated by a mechanism of discontinuous transcription unique 
among RNA viruses. This process includes a template switch during the synthesis of 
negative-strand seRNAs to add a copy of the leader sequence, located at the genome 5’ 
end. 


2. Coronavirus transcription is regulated by multiple factors, including the extent of base- 
pairing between the complement of the TRS-B in the nascent negative strand and the 
TRS-L as well as protein-RNA and RNA-RNA interactions. Moreover, coronavirus N 
protein RNA chaperone activity is essential for efficient transcription. 


3. Coronavirus RNA synthesis is associated with extensive modification of intracellular 
membranes, including DMVs and CMs. 


4. The requirement of the nucleus for coronavirus replication is variable, but optimum levels 
of progeny are obtained only in its presence. Several coronavirus proteins involved in 
RNA synthesis travel to the nucleus. Conversely, many nuclear proteins are transported 
to the cytoplasm to facilitate coronavirus RNA synthesis. 


5. Coronavirus cis-acting RNA elements involved in RNA synthesis are mainly located in 
the highly structured 5’ and 3’ UTRs. 
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. The replicase proteins nsp7, nsp8, nsp12, and nsp14 may constitute an RTC core 


complex. 


. Coronaviruses encode a proofreading machinery, unique among the RNA viruses, to 


ensure the maintenance of their large genome size. The ExoN activity of nsp14 is a key 
element of the proofreading system. 


FUTURE ISSUES 


1. 


Further research is required on cis-acting elements involved in replication and transcrip- 
tion, and on the viral and cellular proteins that bind them. 


. The function and dynamics of DMVs and CMs and the precise localization of the sites 


of active viral RNA synthesis remain unresolved questions. 


. The contribution of cytoplasmic RNA-protein complexes containing viral RNAs, such 


as stress granules, to the regulation of coronavirus RNA expression requires further 
research. 


. Limited information is available on the temporal regulation of viral translation, replica- 


tion, and transcription over the course of infection and on how switching between these 
processes occurs. 


. In vitro reconstitution of the RTC core complex will allow the study of the coronavirus 


proofreading mechanism, the temporal or spatial regulation of proofreading and capping 
activities, which share several viral components, and the role of N protein RNA chaperone 
activity. 


. Understanding the interaction of cell and viral proteins within the nucleus, and of nuclear 


proteins traveling to the cytoplasm to interact with viral factors, may provide novel 
avenues to clarify coronavirus replication. 


. It is still unknown whether replication and transcription are simultaneous or sequential 


processes. 
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